Model Selection

Low WER

# Low WER

Whosper Large V2

Whosper-large-v2 is a cutting-edge speech recognition model specifically designed for Wolof, the primary language of Senegal. Built upon OpenAI's Whisper-large-v2, it significantly improves Word Error Rate (WER) and Character Error Rate (CER).

Speech Recognition Supports Multiple Languages

Whisper Hindi2Hinglish Swift

A Hindi-Hinglish mixed speech recognition model optimized based on the Whisper architecture, specifically designed for Indian accents and noisy environments

Speech Recognition

Transformers Supports Multiple Languages

Viwhisper Medium

Whisper-medium model optimized for Vietnamese speech recognition tasks, fine-tuned on 1308 hours of Vietnamese data

Speech Recognition

Transformers Other

Parakeet Ctc 0.6b

Parakeet CTC 0.6B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 600 million parameters, supporting English speech transcription.

Speech Recognition English

Parakeet Rnnt 0.6b

Parakeet RNNT 0.6B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 600 million parameters, specifically designed for transcribing English speech into text.

Speech Recognition English

Parakeet Ctc 1.1b

Parakeet CTC 1.1B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 1.1 billion parameters, supporting English speech transcription.

Speech Recognition English

Whisper Large V3 French

A French automatic speech recognition model fine-tuned based on OpenAI Whisper-large-v3, supporting case sensitivity, punctuation, and number prediction

Speech Recognition

Transformers French

Asr Whisper Medium Commonvoice Ar

A Whisper medium speech recognition model fine-tuned on the CommonVoice Arabic dataset, developed by the SpeechBrain team

Speech Recognition Arabic

Stt En Fastconformer Transducer Xlarge

The NVIDIA FastConformer-Transducer is a high-performance model for English automatic speech recognition (ASR), utilizing an optimized FastConformer architecture and Transducer decoder with approximately 618 million parameters.

Speech Recognition English

Stt En Fastconformer Ctc Xlarge

NVIDIA FastConformer-CTC XLarge is an Automatic Speech Recognition (ASR) model with approximately 600 million parameters, designed specifically for English speech transcription and trained using the FastConformer architecture and CTC loss.

Speech Recognition English

Whisper Small Cv11 French

A French automatic speech recognition model fine-tuned based on openai/whisper-small, trained on the Common Voice 11.0 French dataset, supporting case sensitivity and punctuation prediction.

Speech Recognition

Transformers French

Stt Rw Conformer Transducer Large

This is a large Conformer-Transducer model for Kinyarwanda speech recognition, which can transcribe speech into lowercase Latin letters, supporting spaces and apostrophes.

Speech Recognition Other

Stt Es Conformer Transducer Large

This is a large Conformer-Transducer model for Spanish automatic speech recognition, with approximately 120 million parameters, trained on 1340 hours of Spanish speech data.

Speech Recognition Spanish

Stt De Conformer Transducer Large

This is a large Conformer-Transducer model for German automatic speech recognition, with approximately 120 million parameters, supporting the transcription of German speech into text.

Speech Recognition German

Stt De Conformer Ctc Large

This is a large-scale Conformer-CTC model for German automatic speech recognition, trained and optimized by NVIDIA on thousands of hours of German speech data.

Speech Recognition German

Wav2vec2 Large Xlsr 53 Chinese Zn Cn Aishell1

A Chinese speech recognition model fine-tuned on the AISHELL-1 dataset based on facebook/wav2vec2-large-xlsr-53, supporting Chinese speech recognition tasks.

Speech Recognition

Transformers Chinese

Wav2vec2 Large Xlsr 53 German Cv9

This is an automatic speech recognition (ASR) model fine-tuned on the German Common Voice 9.0 dataset, based on Facebook's wav2vec2-large-xlsr-53 model.

Speech Recognition

Transformers German

Wav2vec2 Base Vietnamese 160h

Vietnamese speech recognition model based on Wav2vec2, fine-tuned on 160 hours of Vietnamese speech data

Speech Recognition

Transformers Other

Wav2vec2 Base Da Ft Nst

Danish speech recognition model fine-tuned on the NST dataset, supporting 16kHz sampled audio input

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Turkish

This is an automatic speech recognition model fine-tuned on the Turkish Common Voice dataset based on the facebook/wav2vec2-large-xlsr-53 model, achieving a test WER of 21.13%.

Speech Recognition Other

Bp500 Base100k Voxpopuli

Speech recognition model optimized for Brazilian Portuguese, trained with 453 hours of audio from 7 public datasets

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Sundanese

A Sundanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality TTS data from OpenSLR

Speech Recognition Other

Asr Wav2vec2 Commonvoice Fr

wav2vec 2.0 speech recognition model trained on the CommonVoice French dataset, using CTC/Attention architecture without requiring a language model

Speech Recognition French

A Wav2vec 2.0 speech recognition model fine-tuned on Brazilian Portuguese datasets, supporting automatic speech recognition tasks for Brazilian Portuguese.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Eo

A speech recognition model fine-tuned for Esperanto using the Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition Other

Wav2vec2 Large Xlsr Open Brazilian Portuguese V2

This is a Wav2vec2 model optimized for Brazilian Portuguese, trained on multiple open datasets for automatic speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Open Brazilian Portuguese

This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained using multiple open Brazilian Portuguese datasets including Common Voice, MLS, CETUC, etc.

Speech Recognition

Transformers Other

Wav2vec2 Base Cynthia Tedlium 2500 V2

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base-960h on the TED-LIUM dataset, achieving a word error rate of 20.33% on the evaluation set.

Speech Recognition

This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained on multiple Brazilian Portuguese datasets, achieving a WER of 13.6 on the Common Voice test set.

Speech Recognition

Transformers Other

Wav2vec2 Live Japanese

A Japanese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting hiragana output

Speech Recognition

Transformers Japanese

Wav2vec2 Large Xlsr 53 Es

A speech recognition model fine-tuned on the Spanish Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model, with a test WER of 10.50%.

Speech Recognition

Transformers Spanish

Wav2vec2 Large Xlsr 53 Esperanto

This is an Esperanto speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.

Speech Recognition Other

Xls R Nl V1 Cv8 Lm

This is an automatic speech recognition model based on the XLS-R architecture, specifically optimized for Dutch and Flemish, incorporating a 5-gram language model to improve recognition accuracy.

Speech Recognition

Transformers Other

This model is an automatic speech recognition model fine-tuned on the Galician dataset based on facebook/wav2vec2-xls-r-300m, achieving a WER of 11.31% on the Common Voice 8.0 test set.

Speech Recognition

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase